Skip to main content

Stateful Replication

Stateful replication is the process of replicating components or services that manage internal state or data (like databases, caches, session-aware services) to provide fault tolerance, high availability, and consistency.

Unlike stateless systems, each replica in a stateful system must maintain or synchronize internal state, which introduces complexity in replication, especially regarding consistency, failover, and synchronization.

Key Goals of Stateful Replication

  • Ensure no data loss even during node failure
  • Provide read/write availability across replicas
  • Maintain data consistency across replicated instances
  • Enable automatic failover and recovery

Components That Require Stateful Replication

  • Databases (PostgreSQL, MySQL, Cassandra)
  • Message Queues (Kafka, RabbitMQ)
  • Stateful Microservices (e.g., real-time game servers)
  • File Storage Systems (like HDFS, Ceph)

Common Replication Models for Stateful Systems

ModelDescription
Primary-ReplicaOne leader handles writes; replicas sync and serve reads (may lag behind).
Multi-LeaderMultiple nodes handle writes and must sync with each other (conflict-prone).
Quorum-basedConsensus protocols (e.g., Paxos, Raft) determine consistency.

Flow of Stateful Replication

  • Client writes go to the primary node.
  • Primary persists the state and sends updates to replicas.
  • Replicas acknowledge the update.
  • In case of primary failure, a replica is promoted as the new primary.

Trade-offs of Stateful Replication

ProsCons
Data durability & fault toleranceComplexity in state synchronization
Enables high availability and backupsRisk of replication lag or inconsistency (async mode)
Resilient to node failuresHarder to scale than stateless systems

Technologies Supporting Stateful Replication

SystemMechanism
PostgreSQLStreaming replication, WAL logs
KafkaPartition leader + ISR replicas
CassandraPeer-to-peer with hinted handoff
RedisPrimary-replica with AOF/RDB
Raft-based DBsLeader election, log replication

Diagram of Stateful Replication

          +-------------------+
| Client App |
+-------------------+
|
v
+---------------------+
| Primary Node | <-- handles all writes
+---------------------+
||
Replication || (state sync)
vv
+----------------+ +----------------+
| Replica Node 1 | | Replica Node 2 |
+----------------+ +----------------+
(read-only or standby)

Example: PostgreSQL Streaming Replication

System Setup

  • 1 Primary PostgreSQL node
  • 2 Replica nodes
  • Replicas use streaming replication to stay up to date

Scenario

  • A financial application stores user transactions.
  • All writes go to the primary database.
  • The replica nodes copy write-ahead logs (WAL) and replay them to stay consistent.
  • If the primary node crashes, a replica is promoted using failover tools (like Patroni).

Flow

  1. User makes a transaction → goes to primary node.
  2. Primary saves the transaction and logs it.
  3. WAL logs are streamed to replicas.
  4. Replicas apply the changes and update their state.
  5. Read-only queries go to replicas for load distribution.

Web Application Replication

Web applications typically store user session state (e.g., login info, cart contents, form data). If this session is stored locally in memory on one server, users need to be routed back to the same server for consistent experience.

This leads to stateful web app replication where the app servers maintain state and need careful routing.

Sticky Sessions

Sticky sessions (also called session affinity) ensure that a user's requests are always routed to the same server where their session state is stored.

Use Case

  • Simple session management (no external session store).
  • Useful for small-scale deployments.

Trade-offs

  • Load imbalance (some servers may get overloaded).
  • Fails if the server crashes (session lost unless session replication is used).

Flow

User A sends login request

→ Routed to App Server 1
→ App Server 1 stores session in memory
→ Sticky session ensures all future requests go to App Server 1

Tools

  • NGINX, HAProxy (supports sticky sessions via cookies/IP hash)
  • AWS ELB (Application Load Balancer supports sticky sessions)

Session Clustering

Session clustering replicates or shares session data across all app server instances. So, any server can handle any request, even in case of failure.

Benefits

  • High availability
  • No reliance on sticky sessions
  • Easy to scale horizontally

How it's implemented

  • In-memory data grids (e.g., Hazelcast, Apache Ignite)
  • Distributed session stores (e.g., Redis, Memcached)
  • Servlet container clustering (e.g., Tomcat session replication)

Flow

User A logs in on App Server 1

→ Session is saved in Redis
→ User's next request goes to App Server 2
→ App Server 2 retrieves session from Redis
→ Continues seamlessly

Database Replication

Databases also require stateful replication to maintain durability, consistency, and availability.

Common Replication Strategies

StrategyDescription
Primary-ReplicaOne primary handles writes; replicas sync for reads (PostgreSQL, MySQL)
Multi-MasterMultiple nodes can write (e.g., Cassandra, CockroachDB)
Quorum-BasedDistributed consensus (e.g., Raft in etcd or Consul)

Flow Example

  • Primary DB handles transaction writes
  • Data is streamed (e.g., via WAL logs) to one or more replica DBs
  • Read replicas handle heavy read operations (e.g., for analytics, reports)
  • On failure, a failover mechanism promotes a replica to become primary

Example Architecture

                          +-------------------------+
| Load Balancer |
+-----------+-------------+
|
Sticky Sessions OR Stateless (Session Store)
|
+----------------+ +----------------+ +----------------+
| App Server 1 | | App Server 2 | | App Server 3 |
| (Stores session| | (or shares via| | Redis/Memcache|
| or uses Redis)| | session cluster) | or Hazelcast |
+--------+-------+ +--------+-------+ +----------------+
\ | /
\ | /
\ +--------v---------+ /
\--------> Redis Cluster <--------
+--------+---------+
|
+--------v--------+
| Primary DB |
+--------+--------+
|
+---------v--------+
| Read Replica(s) |
+-------------------+